2026-02-13
rstanarm package allows us to perform common Bayesian analyses without having to learn StanExamine two brands of rechargeable batteries. How long do they run (in hours) before exhausted?
Perhaps we’re not sure which brand is better. A possible prior distribution might look like this, a \(N(0,3)\) distribution.
Bayesian uncertainty interval estimated from the posterior distribution:
Let’s say we want to estimate the probability that the difference in battery life is:
as.data.frame() creates a data frame of posterior samples.
The previous example was a simple model with one predictor. It was essentially the Bayesian approach to what is traditionally analyzed as a t-test.
Today’s workshop will cover more complicated analyses including:
Multiple regression, or linear modeling, is the idea that the variability of some numeric variable of interest can be “explained” by a sum of weighted predictors.
Example: patient satisfaction scores at a hospital. Some are high, some are low. Why? Perhaps it has do with their age, anxiety level, and illness severity.
\[\text{satisfaction} = \beta_0 + \beta_1 \text{age} + \beta_2 \text{anxiety} + \beta_3 \text{illness}\]
Where the betas represent some weight. Hence the term, weighted sum. \(\beta_0\) is the intercept.
\[\text{satisfaction} = \beta_0 + \beta_1 \text{age} + \beta_2 \text{anxiety} + \beta_3 \text{illness}\]
This model says, “if I take age, anxiety level and illness severity, multiply each by some weight, and add them up, I’ll get an expected patient satisfaction score.”
The calculated value will be off by some amount. We assume this amount, usually denoted \(\epsilon\), is a random draw from a Normal distribution with mean 0 and some unknown standard deviation, \(\sigma\). This gives us
\[\text{satisfaction}_i = \beta_0 + \beta_1 \text{age} + \beta_2 \text{anxiety} + \beta_3 \text{illness} + \epsilon\]
Traditional multiple regression means estimating the betas and \(\sigma\).
lmThe traditional approach in R uses the lm function.
The betas (coefficients/weights) and \(\sigma\) can be viewed with coef and sigma
glmWe can also use glm for multiple regression. Note the family argument which allows us to specify the error distribution.
As before, instead of estimating parameters, the Bayesian approach is to estimate the distributions of parameters.
We propose a prior distribution for the betas and \(\sigma\), and update those distributions using a likelihood and the data.
Recall our model:
\[\text{satisfaction}_i = \beta_0 + \beta_1 \text{age} + \beta_2 \text{anxiety} + \beta_3 \text{illness} + \epsilon\]
where \(\epsilon \sim N(0,\sigma)\)
This implies
\[\text{satisfaction}_i \sim N(\beta_0 + \beta_1 \text{age} + \beta_2 \text{anxiety} + \beta_3 \text{illness}, \sigma)\]
This is our likelihood: A normal, or Gaussian, distribution with a conditional mean that changes based on age, anxiety and illness.
Where traditional statistics maximizes likelihood, Bayesian statistics multiplies the likelihood by the prior to get a posterior distribution.
stan_glmThe stan_glm function uses the same syntax as glm and provides weakly informative default prior distributions.1
Instead of point estimates for the betas and \(\sigma\), we get posterior distributions.
stan_glm with explicit priorsUse the prior arguments to specify priors. Below are the default priors. The normal() and exponential() functions are from rstanarm.
We proceed the same way as before to evaluate and explore the model.
Let’s go to R!
In our previous model, we assumed the predictor effects were simply additive. For example, it didn’t matter how ill you were, the effect of age was always the same.
\[\text{satisfaction} = \beta_0 + \beta_1 \text{age} + \beta_2 \text{anxiety} + \beta_3 \text{illness}\]
But we may have reason to believe the effects of age and illness interact. Perhaps the older you are, the effect of illness on patient satisfaction decreases.
One way to describe this interaction is to add the product of age and illness to the model:
\[\text{satisfaction} = \beta_0 + \beta_1 \text{age} + \beta_2 \text{anxiety} + \beta_3 \text{illness} + \beta_4 \text{age} \times \text{illness}\]
Use a colon to specify interactions in the model syntax.
Or use the asterisk as a shortcut: age * illness = age + illness + age:illness
Once again the target of the Bayesian model is the collection of posterior distributions on the model weights, or coefficients.
5% 95%
(Intercept) 26.20181467 251.08202193
age -3.42330550 2.16140201
illness -2.33270503 2.20467749
anxiety -25.84181972 -1.29440540
age:illness -0.06550184 0.04424613
sigma 8.65894466 12.60598498
The interaction appears to be small and we’re uncertain whether it’s positive or negative.
The carData package contains data on the prestige of Canadian occupations from the early 1970s. “Prestige” is measured on a scale from 0 - 100, where higher values mean higher prestige.
We attempt to understand the variability of prestige by modeling it as a function of income, type of occupation (bc, wc, prof), and years of education using a Normal likelihood:
\[\text{prestige}_i \sim N(\beta_0 + \beta_1 \text{education} + \beta_2 \text{income} + \beta_3 \text{type} + \beta_4 \text{income} \times \text{type}, \sigma)\]
Note the interaction: this says the effect of income on prestige depends on type of occupation (and vice versa).
Fit the model using stan_glm() with default priors.
The interactions seems negative but small.
5% 95%
(Intercept) -14.866207330 1.418732e+00
education 2.092340851 4.044889e+00
income 0.002233274 3.955202e-03
typeprof 15.828666209 3.403767e+01
typewc -1.944920981 1.553058e+01
income:typeprof -0.003376965 -1.555342e-03
income:typewc -0.002864737 -2.276403e-06
sigma 5.769689510 7.388059e+00
Interaction coefficients are hard to interpret.
To aid in interpretation we can use effect plots to help us visualize our models.
The basic idea is to generate predictions for various combinations of predictors and plot the result.
ggeffects to create effect plotsThe ggeffects package provides methods for easily creating effect plots for models created with rstanarm.
The basic syntax to quickly generate an effect plot for our interaction:
The following plot shows that the interaction between age and illness is small and virtually non-existent.
age:illnessincome:typeWe can specify predictor values as follows:
We can also specify a range of values:
Effect plots are useful for main effects as well (ie, predictors not involved in an interaction)
The 95% credibility ribbon is for the mean response value. Let’s go to R!
Logistic regression models the probability of a binary outcome. The dependent variable is often of the form 0/1, failure/success or no/yes.
Like multiple regression, we model probability as a weighted sum of predictors.
Unlike multiple regression, there is no \(\sigma\) and the weighted sum of predictors are embedded in the logistic function:
\[P(Y=1) = \frac{1}{1 + \text{exp}(-X\beta)}\]
where \(X\beta = \beta_0 + \beta_1X_1 + \beta_2X_2 + \ldots + \beta_kX_k\)
This ensures our model always returns values between 0 and 1.
We can express the logistic regression model as a simple weighted sum of predictors by using the logit transformation:
\[\text{log} \left( \frac{P(Y = 1)}{1 - P(Y = 1)} \right) = \beta_0 + \beta_1X_1 + \ldots + \beta_kX_k\]
In this transformation, the response and coefficients are on the log odds scale.
This is the form logistic regression takes when performed in R (or any other program).
Since we’re modeling the probability of an event happening, our response variable has a binomial distribution with n = 1:
\[Y \sim B\left(n = 1, p = \frac{1}{1 + \text{exp}(-X\beta)} \right)\]
While traditional statistics maximizes this likelihood, Bayesian statistics multiplies the likelihood by the prior to get a posterior distribution.
A clinical trial investigates a new treatment for rheumatoid arthritis. Model probability of seeing improvement in condition based on treatment, sex, and age.
Treatment Sex Age Better
1 Placebo Female 54 0
2 Treated Female 69 0
3 Treated Male 27 1
4 Treated Female 62 1
5 Placebo Male 44 0
6 Treated Male 70 1
\[\text{log} \left( \frac{P(\text{Better} = 1)}{1 - P(\text{Better} = 1)} \right) = \beta_0 + \beta_1\text{Trt} + \beta_2\text{Sex} + \beta_3\text{Age}\]
The traditional method uses glm. Notice we set family = binomial since our response is binary.
The rstanarm specification is virtually identical, except we use stan_glm
The default coefficient priors are the same as those used when performing multiple regression. The intercept has location = 0.
Recall Bayesian modeling does not return point estimates for the coefficients (the betas) but rather distributions. To get a coefficient value, we typically take the median or the mean of the posterior distribution.
The coef function returns the median values.
Exponentiating the coefficient value returns an odds ratio.
The odds that Better = 1 is about 6 times higher for the Treated group: \(\text{exp}(1.74) \approx 5.7\)
An effect plot can help communicate a model in terms of probability.
Let’s go to R!
Let’s say we have the following model:
\[y = \beta_0 + \beta_1x1 + \beta_2x2\]
Do we need \(x2\)? Maybe the following model is just as “good”?
\[y = \beta_0 + \beta_1x1\]
Or maybe a model with just \(x2\) is better than a model with just \(x1\).
\[y = \beta_0 + \beta_2x2\]
In traditional statistics, models are often compared using hypothesis tests or information criteria, such as AIC.
In Bayesian statistics, the widely applicable information criterion (WAIC) is often used.
It is relatively easy to implement with the waic function in the rstanarm package.
Information criteria in general provide an approximation of predictive accuracy. (ie, how accurate will our model perform with out-of-sample data?)
The criteria values by themselves are not meaningful. We compare them across different models and prefer smaller values.
Is the model with the smallest information criteria the “best” model? Not necessarily.
Think of comparing them like judging the results of a horserace. Winning by photo-finish doesn’t instill much confidence that the winning horse is always better. We’re more confident a horse is always better when they win by several seconds.
The well-known Akaike information criterion (AIC) assumes flat (non-informative priors) and that the posterior is approximately normal.
The widely applicable information criterion (WAIC) accommodates informative priors and makes no assumption about the shape of the posterior.
Say we have two models, m1 and m2. Using rstanarm we can compare as follows:
loo_compare outputThe output reports the difference in expected log predictive density (ELPD) along with the standard error of the difference.
The difference in ELPD will be negative if the expected out-of-sample predictive accuracy of the first model is higher. If the difference is positive, then the second model is preferred.
The standard error of the difference, se_diff, gives us some idea of how much “better” the winning model really is.
loo versus waicMcElreath (2016) recommends using WAIC for model selection. Vehtari, Gelman, and Gabry (2017) recommend Leave-one-out cross-validation (LOO).
Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations.
PSIS stands for Pareto-smoothed importance sampling, which is used in the computation of LOO.
The loo_compare function will advise us when we should consider using LOO instead of WAIC.
Let’s go to R!
rstanarmEspe, M. et al. (2016) Yield gap analysis of US rice production systems shows opportunities for improvement. Field Crops Research, 196:276-283.
Kubrak, O. et al. (2017) Adaptation to fluctuating environments in a selection experiment with Drosophila melanogaster. Ecology and Evolution, 7:3796-3807.
Herzog, S. et al. (2017) Sun Protection Factor Communication of Sunscreen Effectiveness. JAMA Dermatology, 153(3):348-350.
Kovic, M. and Hänsli, N. (2017) The impact of political cleavages, religiosity, and values on attitudes towards nonprofit organizations. Social Sciences, 7(1), 2.
Schnase, J. L., Carroll, M. L., Montesano, P. M., & Seamster, V. A. (2025, September 1). Shifts in breeding phenology for Cassin’s Sparrow (Peucaea cassinii) over four decades. Journal of Field Ornithology, 96(3), 1 - 11. Retrieved from https://doi.org/10.5751/JFO-00691-960303.
Gelman, A., Hill, J, & Vehtari, A. (2020) Regression and Other Stories. Cambridge University Press.
Goodrich B, Gabry J, Ali I & Brilleman S. (2025). rstanarm: Bayesian applied regression modeling via Stan. R package version 2.32.2 https://mc-stan.org/rstanarm.
McElreath, M. (2016). Statistical Rethinking. CRC Press. Boca Raton.
Muth, C., Oravecz, Z., & Gabry, J. (2018). User-friendly Bayesian regression modeling: A tutorial with rstanarm and shinystan. The Quantitative Methods for Psychology, 14(2), 99-119.
Nicenboim, B., Schad, D. , and Vasishth, S. (2021) An Introduction to Bayesian Data Analysis for Cognitive Science. https://vasishth.github.io/bayescogsci/book/
For free statistical consulting, contact us: statlab@virginia.edu
Sign up for more workshops or see past workshops:
https://library.virginia.edu/data/training
Register for the Research Data Services newsletter to be notified of new workshops:
https://library.virginia.edu/data/newsletters